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ABSTRACT 

We propose a method for detecting obstacles by 
comparing input and reference train frontal view 
camera images. In the field of obstacle detection, most 
methods employ a machine learning approach, so they 
can only detect pre-trained classes, such as pedestrian, 
bicycle, etc. Those obstacles of unknown classes 
cannot be detected. To overcome this problem, a 
background subtraction method was proposed that can 
be applied to moving cameras. First, the proposed 
method computes frame-by-frame correspondences 
between the current and the reference image 
sequences. Then, obstacles are detected by applying 
image subtraction to corresponding frames. To 
confirm the effectiveness of the proposed method, we 
conducted an experiment using several image 
sequences captured on an experimental track. Its 
results showed that the proposed method could detect 
various obstacles accurately and effectively. 

Keywords: Railway safety, Object detection, 

Subtraction techniques 

1. INTRODUCTION 

Railway accidents caused by obstacles are one of the 
most important issues that should be solved. There is 
a demand for obstacle detection systems, and 
accordingly, a surveillance system for detecting 
obstacles in level crossings has been developed [1]. 
However, the area that can be monitored by this 
system is restricted due to a fixed camera. On the 
other hand, various sensing devices can be used for 
obstacle detection by being mounted on the front of a 
train. Since these devices do not require large 
modifications to the current railway system, 
especially to ground-side facilities, they may be easily 
introduced. Therefore, obstacle detection methods 
using frontal view sensors are expected [2, 3, 4, 5, 6, 


7]. However, in the case of railway, distant obstacles 
must be detected since the braking distance of a train 
is very long. Therefore, using millimetre-wave 
RADAR and LIDAR is not an option due to their low 
resolutions. In addition, using multiple sensors 
increases the cost. From this point of view, a train 
frontal view camera can be considered as the option 
for obstacle detection in a railway system. Object 
detection by camera is one of the most active research 
areas in the computer vision field, an numerous 
methods have been proposed [6, 7, 8, 9, 10]. Most 
methods employ machine learning approach, and they 
can detect pre-trained objects, such as pedestrian, 
bicycle, etc. However, unknown objects cannot be 
detected by these methods. Although background 
subtraction could be a solution, it cannot be simply 
applied to a train frontal view camera, since it moves 
together with the train. Therefore, it is important 
develop a method for forward obstacles detection 
based on background for most of them use a single 
image sequence and only moving objects can be 
detected [11,12], Meanwhile, Kyutoku et al. proposed 
a method for detecting general obstacles by a car 
mounted camera by subtracting the current image 
sequence from the reference (database) image 
sequence [13]. By assuming that these two image 
sequences are captured on slightly different driving 
paths, this method succeeded to accurately align two 
image sequences with the metric. This assumption 
requires sufficient base-line length between cameras 
capturing the two image sequences to compute the 
metric between the sequences. However, in the case 
of railway, sufficient base-line length cannot be 
obtained since trains always run on the same tracks. In 
addition, since this method only aligns road surfaces 
between two image sequences, a large registration 
error will occur outside of it. Thus, distant / small 
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obstacles cannot be distinguished accurately due to 
noise caused by the image registration error. 
Therefore, we propose a moving camera background 
subtraction method, which method detects obstacles 
by comparing input and reference images. The 
contributions of this paper are: 

1. Introduction of a new metric that can align two 
image sequences even if the base-line length 
between two cameras is small. 

2. Detection of arbitrary distant obstacles by pixel- 
wise image registration and integration of multiple 
image subtraction mechanisms. 



fig- 1- Frame wort; of the proposed method. 


2. MOVING CAMERA BACKGROUND 
SUBTRACTION FOR OBSTACLE DETECTION 

To detect obstacles by subtracting two image 
sequences, pixel-level alignment is needed. In the 
case of a train frontal view camera, since an image 
sequence is captured from a moving train, two image 
sequences must be aligned both spatially and 
temporally. To solve this, the proposed method first 
finds a reference frame captured at the most similar 
location to the current frame by image sequence 
matching. Then, it performs pixel-wise registration 
between the current frame and its corresponding 
reference frame. Finally, multiple image subtraction 
methods are applied to compute the image difference 
between the two frames, and obstacles are detected by 
integrating their outputs. Figure 1 shows the 
framework of the proposed method. 

2.1. Temporal alignment: Computation of frame- 
by frame correspondences 

In the case of railway, train frontal view cameras 
always take the same trajectory since trains run on the 
same track. This results in a very short base-line 
length between cameras of the current and the 
reference image sequences. To cope with this 
situation, the proposed method introduces a new 


metric to align the two image sequences. Figure 2 
shows close and distant train frontal views of the 
current and the reference image sequences. Let the 
current and the reference image sequences be F ={fl, 
fl, fp} and G = fgl, g2, ..., gq}, respectively. Here 
fide notes the i -th frame of the current image 
sequence, and#/ denotes the j -the frame of the 
reference image sequence. First, the proposed method 
computes the frame-by-frame correspondences 
between sequences F and G. Next, the distance d{i, j ) 
between the current frame// and the reference frame 
gjis calculated as the where niji s the number of 
corresponding key-point pairs between// and gj, 6ijk is 
the angle of the k-th key-point pair represented by the 
polar coordinate system, mijis the mean of dijk, and a 
is a positive constant. Here, the angle is represented 
by relative angle from the x-axis. In this equation, if 
the current frame is captured at a camera position 
close to the reference frame, the variance becomes 
small. Moreover, it can be computed regardless of the 
base-line length between two cameras. Finally, frame 
correspondences (fi, ^/between the current and the 
reference image sequences are obtained by applying 
Dynamic Time Warping to minimiz ed(i, j ). Figure 3 
shows an example of corresponding frames of the 
current and the reference image sequences. 

2.2. Spatial alignment: Computation of pixel-wise 
image registration for temporally aligned 
frames 

To obtain accurate image alignment, the proposed 
method performs pixel-wise image registration 
against corresponding frames fi and gj obtained in the 
previous step. Here, Deep Flow [14] is used for 
calculating the deformation field from gj to fi. Then, 
completely aligned image g_j is obtained by applying 
the deformation field to gj. Figure 4 shows the 
absolute image difference between the frames in Fig. 

3. Figure 4(a) shows the image difference between the 
original frames I// - gj and Fig. 4(b) shows the image 
difference after pixel-wise alignment I// - g_jin these 
images, darker pixels indicate larger image errors. 

2.3. Image subtraction for completely aligned 
images 

Robustness against lighting conditions is one of the 
most important issues when developing a system for 
railways, since it needs to handle various 
environments. Here, multiple image subtraction 
metrics are combined to solve this problem. First, two 
types of image subtraction metrics are calculated from 
fi and gj. The first one is Normalized Vector 
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Distance (NVD), and is calculated as, NVD(a, b) =lal- 
Ibl. Here, a and b are image patches represented in 
vectors consisting of RGB channels. The second one 
is Radial Reach Filter (RRF) proposed by Satoh et al. 
[15]. RRF is calculated by comparing the intensity of 
each RGB channel between the target pixel and its 
surroundings. Next, to reduce noise, Gaussian filter is 
applied to difference images obtained by NVD and 
RRF. Then, two binary images dlij and <72 //are 
obtained by thresholding. Here, the threshold T for the 
linearization is determined as, T = juij+ naij, (A) where 
fiij and oij are the average and the variance of each 
difference image, respectively. Finally, the extracted 
pixels are considered as candidates of obstacles. 

3. EXPERIMENTS AND DISCUSSIONS 

To evaluate the effectiveness of the proposed method, 
we prepared train frontal view images captured on a 
test line in the premises of the Railway Technical 
Research Institute, Japan. Grasshopper3 (Point Grey 
Research, Inc.) was mounted on 2.5m height of the 
front view of a railway trolley. The size of captured 
images was 1,920 x 1,440 pixels, and the frame rate 
was 10 fps. The focal length of the camera was 25 
mm, and the pixel pitch was 4.54 ^m. In this 
experiment, the railway trolley was controlled 
manually. A total of 2,117 frames were contained in 
the dataset which was constructed by extracting 
frames in five frames interval from the recorded five 
videos. No obstacle existed in three videos, and the 
other two videos included a pedestrian and a box as 
obstacles, respectively. Bounding-boxes of all 
obstacles were annotated manually. One of the videos 
including no obstacle was used as the reference image 
sequence, and the other videos wereused as the 
current image sequences. 

4. CONCLUSIONS 

This paper proposed a method of moving camera 
background subtraction for forward obstacle detection 
from a train frontal view camera. To detect general 
obstacles, frame-by-frame correspondences between 
the current and the reference image sequences of train 
frontal view were computed based on the angle 
difference of corresponding key-points. After, pixel 
wise image registration; obstacles were detected by 
integrating two kinds of subtraction methods. To 
demonstrate the effectiveness of the proposed method, 
experiments were conducted by capturing train frontal 
view image sequences on an experimental track. Its 
results showed the effectiveness of the proposed 
method. Future works include introduction of a 


background modelling method and evaluation in 

various lighting conditions, seasons, and weathers. 
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